In this Take-home Exericse, hedonic pricing models will be build to explain the factors affecting resale prcies of public housing in Singapore. The hedonic price models will be built using approriate Geographical Weighted Regression (GWR) methods.
When buying a house, it is generally a major investment for most people. There are many factors that affects the housing prices, such as inflation rates, economy of country, which are more global in nature. There are other factors that are specific to the properties itself also, which can be further divided to structural and locational factors. Structural factors are variables related to the property themselves such as the size, fitting, and tenure of the property. Locational factors are variables related to the neighbourhood of the properties such as proximity to childcare centre, public transport service and shopping centre.
The effect of housing factors in relation to price will be examined using hedonic pricing models. Ordinary Least Square (OLS) method is used to built the model, however it fails to take into consideration that spatial autocorrelation and spatial heterogeneity exist in geographic data sets such as housing transactions. The estimation of OLS of hedonic pricing models could lead to biased, inconsistent or inefficient results. GWR was introduced to calibrate hedonic price model for housing to due with this limitation.
To build hedonic pricing models to explain factors affecting the resale prices of public housing in Singapore. Appropriare GWR methods must be used to built these models.
olsrr: for building OLS and performing diagnostics tests
GWmodel: to calibrate geographical weighted family of models
corrplot: for multivariate data visualisation and analysis
tmap: to plot cartographic quality static point patterns maps or interactive maps by using leaflet API
httr: Useful tools for working with HTTP organised by HTTP verbs (GET(), POST(), etc)
sf: used to import, manage and process vector-based geospatial data in R.
tidyverse: attribute data handling
packages = c('olsrr', 'corrplot', 'ggpubr', 'sf', 'spdep', 'GWmodel', 'tmap', 'tidyverse', 'httr', 'sp')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
childcare_sf <- st_read("data/geospatial/child-care-services-geojson.geojson")
Reading layer `child-care-services-geojson' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\child-care-services-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 1545 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.248403 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
eldercare_sf <- st_read(dsn = "data/geospatial", layer="ELDERCARE")
Reading layer `ELDERCARE' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 120 features and 19 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
mrtlrt_sf <- st_read(dsn = "data/geospatial", layer="MRTLRTStnPtt")
Reading layer `MRTLRTStnPtt' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 171 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21
park_sf <- st_read("data/geospatial/parks-geojson.geojson")
Reading layer `parks-geojson' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\parks-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 350 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6929 ymin: 1.214058 xmax: 104.0017 ymax: 1.461503
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
presch_sf <- st_read("data/geospatial/pre-schools-location-geojson.geojson")
Reading layer `pre-schools-location-geojson' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\pre-schools-location-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 1925 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6824 ymin: 1.247759 xmax: 103.9897 ymax: 1.462134
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
supermarket_sf <- st_read("data/geospatial/supermarkets-geojson.geojson")
Reading layer `supermarkets-geojson' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial\supermarkets-geojson.geojson'
using driver `GeoJSON'
Simple feature collection with 526 features and 2 fields
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 103.6258 ymin: 1.24715 xmax: 104.0036 ymax: 1.461526
z_range: zmin: 0 zmax: 0
Geodetic CRS: WGS 84
mpsz_sf <- st_read(dsn="data/geospatial",
layer="MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21
No invalid geometries is observed in childcare_sf, eldercare_sf, mrtlrt_sf, presch_sf, park_sf and supermarket_sf data frames. However, mpsz_sf data frame consists of 9 invalid geometries. The invalid geometries will be removed in the mpsz_sf data frame using st_make_valid() and check for any invalid geometries again.
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 120 features and 19 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
First 10 features:
fid OBJECTID ADDRESSBLO ADDRESSBUI ADDRESSPOS
1 1 1 <NA> <NA> 601318
2 2 2 <NA> <NA> 462509
3 3 3 <NA> <NA> 640190
4 4 4 <NA> <NA> 190005
5 5 5 <NA> <NA> 160044
6 6 6 <NA> <NA> 160117
7 8 8 <NA> <NA> 731569
8 9 9 <NA> <NA> 651210
9 10 10 <NA> <NA> 540182
10 11 11 <NA> <NA> 228080
ADDRESSSTR ADDRESSTYP DESCRIPTIO
1 318A Jurong East Avenue 1 #02-308 <NA> <NA>
2 Blk 509B Bedok North Street 3 #02-157 <NA> <NA>
3 Blk 190 Boon Lay Drive #01-242 <NA> <NA>
4 5 Beach Rd #02-4943 <NA> <NA>
5 Blk 44 Beo Crescent #01-67 <NA> <NA>
6 Blk 117 Jalan Bukit Merah #01-1683 <NA> <NA>
7 569A Champion Way #01-346 <NA> <NA>
8 210A Bukit Batok St 21 #01-294 <NA> <NA>
9 Blk 182 Rivervale Crescent\n#01-311 <NA> <NA>
10 81 Wilkie Road <NA> <NA>
HYPERLINK LANDXADDRE LANDYADDRE
1 <NA> 0 0
2 <NA> 0 0
3 <NA> 0 0
4 <NA> 0 0
5 <NA> 0 0
6 <NA> 0 0
7 <NA> 0 0
8 <NA> 0 0
9 <NA> 0 0
10 <NA> 0 0
NAME PHOTOURL
1 Yuhua Senior Activity Centre <NA>
2 THK SAC @ Kaki Bukit <NA>
3 THK SAC @ Boon Lay <NA>
4 PEACE-Connect Senior Activity Centre@5 <NA>
5 THK SAC @ Beo Crescent <NA>
6 Silver ACE @ Bukit Merah <NA>
7 Care Corner Senior Activity Centre (WL569) <NA>
8 Fei Yue Senior Activity Centre (Bukit Batok Branch) <NA>
9 COMNET Senior Activity Centre @ 182 Rivervale Crescent <NA>
10 Abdullah Shooker Jewish Welfare Home <NA>
ADDRESSFLO INC_CRC FMEL_UPD_D ADDRESSUNI X_ADDR
1 <NA> 2B0DB92FDD914FFC 2016-07-28 <NA> 16614.08
2 <NA> 82728FA30612F3FD 2016-07-28 <NA> 38803.81
3 <NA> DE7A8D4EA0BD1D9B 2016-07-28 <NA> 14481.92
4 <NA> A2C058FC5785F7FE 2016-07-28 <NA> 31505.35
5 <NA> 9DBFD51E056AEE70 2016-07-28 <NA> 27218.35
6 <NA> 169DABA5B6ECEA87 2016-07-28 <NA> 27278.94
7 <NA> 4DC6800E9BB385BE 2016-07-28 <NA> 23147.94
8 <NA> EFBD712DA5DD6FEC 2016-07-28 <NA> 18820.58
9 <NA> 6BB0D7698D7B4C7D 2016-07-28 <NA> 36446.37
10 <NA> EBA4B3E6F7F5B8F0 2016-07-28 <NA> 29544.78
Y_ADDR geometry
1 36639.12 POINT (16614.08 36639.12)
2 35098.78 POINT (38803.81 35098.78)
3 36357.61 POINT (14481.92 36357.61)
4 31853.52 POINT (31505.35 31853.52)
5 30135.49 POINT (27218.35 30135.49)
6 29350.17 POINT (27278.94 29350.17)
7 45761.17 POINT (23147.94 45761.17)
8 36396.32 POINT (18820.58 36396.32)
9 41376.90 POINT (36446.37 41376.9)
10 31691.08 POINT (29544.78 31691.08)
Simple feature collection with 0 features and 3 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID STN_NAME STN_NO geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 2 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Geodetic CRS: WGS 84
[1] Name Description geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 15 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21
[1] OBJECTID SUBZONE_NO SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
[7] PLN_AREA_C REGION_N REGION_C INC_CRC FMEL_UPD_D X_ADDR
[13] Y_ADDR SHAPE_Leng SHAPE_Area geometry
<0 rows> (or 0-length row.names)
Only eldercare_sf contains missing values. Columns with missing values are removed.
eldercare_sf = select(eldercare_sf, -3, -4, -7,-8, -9, -13, -14, -17)
Checking for missing values again
Simple feature collection with 0 features and 11 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] fid OBJECTID ADDRESSPOS ADDRESSSTR LANDXADDRE LANDYADDRE
[7] NAME INC_CRC FMEL_UPD_D X_ADDR Y_ADDR geometry
<0 rows> (or 0-length row.names)
any(duplicated(childcare_sf))
[1] FALSE
any(duplicated(eldercare_sf))
[1] FALSE
any(duplicated(mrtlrt_sf))
[1] FALSE
any(duplicated(park_sf))
[1] FALSE
any(duplicated(presch_sf))
[1] FALSE
any(duplicated(supermarket_sf))
[1] FALSE
any(duplicated(mpsz_sf))
[1] FALSE
There is no duplicated in the data sets above.
st_crs() of sf package is used to display the coordinate reference system information.
st_crs(childcare_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(eldercare_sf)
Coordinate Reference System:
User input: SVY21 / Singapore TM
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mrtlrt_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
st_crs(park_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(presch_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(supermarket_sf)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: SVY21
wkt:
PROJCRS["SVY21",
BASEGEOGCRS["SVY21[WGS84]",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ID["EPSG",6326]],
PRIMEM["Greenwich",0,
ANGLEUNIT["Degree",0.0174532925199433]]],
CONVERSION["unnamed",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["Degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["(E)",east,
ORDER[1],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]],
AXIS["(N)",north,
ORDER[2],
LENGTHUNIT["metre",1,
ID["EPSG",9001]]]]
From the output above, there are datasets that has EPSG code of 4326. and 9001 The correct coordinate system code for SVY21 is 3414 instead of 4326 and 9001. st_transform() of sf package will be used to assign the correct EPSG code to all data frames.
childcare_sf <- st_transform(childcare_sf, 3414)
eldercare_sf <- st_transform(eldercare_sf, 3414)
mrtlrt_sf <- st_transform(mrtlrt_sf, 3414)
park_sf <- st_transform(park_sf, 3414)
presch_sf <- st_transform(presch_sf, 3414)
supermarket_sf <- st_transform(supermarket_sf, 3414)
mpsz_sf <- st_transform(mpsz_sf, 3414)
Checking CRS again
st_crs(childcare_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(eldercare_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mrtlrt_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(park_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(presch_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(supermarket_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
st_crs(mpsz_sf)
Coordinate Reference System:
User input: EPSG:3414
wkt:
PROJCRS["SVY21 / Singapore TM",
BASEGEOGCRS["SVY21",
DATUM["SVY21",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4757]],
CONVERSION["Singapore Transverse Mercator",
METHOD["Transverse Mercator",
ID["EPSG",9807]],
PARAMETER["Latitude of natural origin",1.36666666666667,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8801]],
PARAMETER["Longitude of natural origin",103.833333333333,
ANGLEUNIT["degree",0.0174532925199433],
ID["EPSG",8802]],
PARAMETER["Scale factor at natural origin",1,
SCALEUNIT["unity",1],
ID["EPSG",8805]],
PARAMETER["False easting",28001.642,
LENGTHUNIT["metre",1],
ID["EPSG",8806]],
PARAMETER["False northing",38744.572,
LENGTHUNIT["metre",1],
ID["EPSG",8807]]],
CS[Cartesian,2],
AXIS["northing (N)",north,
ORDER[1],
LENGTHUNIT["metre",1]],
AXIS["easting (E)",east,
ORDER[2],
LENGTHUNIT["metre",1]],
USAGE[
SCOPE["Cadastre, engineering survey, topographic mapping."],
AREA["Singapore - onshore and offshore."],
BBOX[1.13,103.59,1.47,104.07]],
ID["EPSG",3414]]
All dataframes has the EPSG code of 3414 now.
st_geometry(childcare_sf)
Geometry set for 1545 features
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 11203.01 ymin: 25667.6 xmax: 45404.24 ymax: 49300.88
z_range: zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(eldercare_sf)
Geometry set for 120 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 14481.92 ymin: 28218.43 xmax: 41665.14 ymax: 46804.9
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(mrtlrt_sf)
Geometry set for 171 features
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 6138.311 ymin: 27555.06 xmax: 45254.86 ymax: 47854.2
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(park_sf)
Geometry set for 350 features
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 12373.11 ymin: 21869.93 xmax: 46735.95 ymax: 49231.09
z_range: zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(presch_sf)
Geometry set for 1925 features
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 11203.01 ymin: 25596.33 xmax: 45404.24 ymax: 49300.88
z_range: zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(supermarket_sf)
Geometry set for 526 features
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: 4901.188 ymin: 25529.08 xmax: 46948.22 ymax: 49233.6
z_range: zmin: 0 zmax: 0
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
st_geometry(mpsz_sf)
Geometry set for 323 features
Geometry type: GEOMETRY
Dimension: XY
Bounding box: xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
All geometry type are points except for mpsz_sf which
For childcare and eldercare
tmap_mode('view')
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(childcare_sf) +
tm_dots(col="red") +
tm_shape(eldercare_sf) +
tm_dots(col="blue") +
tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')
For mrtlrt and park
tmap_mode('view')
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(mrtlrt_sf) +
tm_dots(col="red") +
tm_shape(park_sf) +
tm_dots(col="blue") +
tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')
For pre school and supermarket
tmap_mode('view')
tm_shape(mpsz_sf) +
tm_polygons() +
tm_shape(presch_sf) +
tm_dots(col="red") +
tm_shape(supermarket_sf) +
tm_dots(col="blue") +
tm_view(set.zoom.limits = c(11,14))
tmap_mode('plot')
Importing the dataset into R dataframe by using read_csv() function of readr package in tidyverse:
hdb_resale <- read_csv("data/aspatial/resale-flat-prices-based-on-registration-date-from-jan-2017-onwards.csv")
Use glimpse() to learn more about the attribute information in the data frame.
glimpse(hdb_resale)
For this Take-Home Exercise, we will only focus on four-room flats transacted from 1st January 2019 to 30th September 2020. Thus, the data set will be filtered according to the focus.
filter() function of dplyr package in tidyverse is used to filter the data of 4-room flats from 1st January 2019 to 30th September 2020
mutate() function of dplyr package in tidyverse is used to add two new columns to hdbresale_jan19sep20: address (consisting of block and street name) and remaininglease_years (remaining lease in terms of years only)
hdb_resale_jan19sep20 <- hdb_resale %>%
filter(month %in% c("2019-01", "2019-02", "2019-03", "2019-04", "2019-05", "2019-06", "2019-07", "2019-08", "2019-09", "2019-10", "2019-11", "2019-12","2020-01", "2020-02", "2020-03", "2020-04", "2020-05", "2020-06", "2020-07", "2020-08", "2020-09")) %>%
filter(flat_type == "4 ROOM")
remaining_lease_list <- strsplit(hdb_resale_jan19sep20$remaining_lease, " ")
hdb_resale_jan19sep20$remaininglease_years <- 0
for (x in 1:length(remaining_lease_list)) {
if (length(unlist(remaining_lease_list[x])) > 2) {
year <- as.numeric(unlist(remaining_lease_list[x])[1])
month <- as.numeric(unlist(remaining_lease_list[x])[3])
hdb_resale_jan19sep20$remaininglease_years[x] <- year + round(month/12, 2)
}
else {
year <- as.numeric(unlist(remaining_lease_list[x])[1])
hdb_resale_jan19sep20$remaininglease_years[x] <- year
}
}
hdb_resale_data <- hdb_resale_jan19sep20 %>%
mutate(hdb_resale_jan19sep20, address = paste(block, street_name)) %>%
select(month, town, flat_type, address, block, street_name, storey_range, floor_area_sqm, flat_model, lease_commence_date, remaining_lease, remaininglease_years, resale_price)
hdb_resale_data
# A tibble: 15,901 x 13
month town flat_type address block street_name storey_range
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 2019-01 ANG MO KIO 4 ROOM 204 AN~ 204 ANG MO KIO~ 01 TO 03
2 2019-01 ANG MO KIO 4 ROOM 175 AN~ 175 ANG MO KIO~ 07 TO 09
3 2019-01 ANG MO KIO 4 ROOM 543 AN~ 543 ANG MO KIO~ 01 TO 03
4 2019-01 ANG MO KIO 4 ROOM 118 AN~ 118 ANG MO KIO~ 04 TO 06
5 2019-01 ANG MO KIO 4 ROOM 411 AN~ 411 ANG MO KIO~ 04 TO 06
6 2019-01 ANG MO KIO 4 ROOM 546 AN~ 546 ANG MO KIO~ 10 TO 12
7 2019-01 ANG MO KIO 4 ROOM 428 AN~ 428 ANG MO KIO~ 07 TO 09
8 2019-01 ANG MO KIO 4 ROOM 173 AN~ 173 ANG MO KIO~ 04 TO 06
9 2019-01 ANG MO KIO 4 ROOM 607 AN~ 607 ANG MO KIO~ 10 TO 12
10 2019-01 ANG MO KIO 4 ROOM 603 AN~ 603 ANG MO KIO~ 07 TO 09
# ... with 15,891 more rows, and 6 more variables:
# floor_area_sqm <dbl>, flat_model <chr>,
# lease_commence_date <dbl>, remaining_lease <chr>,
# remaininglease_years <dbl>, resale_price <dbl>
There is a total of 15901 4-room flat transactions from 1st January 2019 to 30 September 2020 after filtering.
It is noticed that street name such as “ST. GEORGE’S” has problem when obtaining the latitude and longitude in the later part. The short form “ST.” is replaced with its full name “SAINT”.
hdb_resale_data$address <- sub('ST\\.','SAINT',hdb_resale_data$address)
To get the coordinates for each 4-room flat, the address column in hdb_resale_data is used in OneMap Search API which returns search results with both latitude and longitude. Obtain the result using GET() function of httr package and add the values of the new columns, LATITUDE and LONGITUDE respectively into each row of hdb_resale_data.
for (x in 1:nrow(hdb_resale_data)) {
address <- hdb_resale_data[x,'address']
url = paste('https://developers.onemap.sg/commonapi/search?searchVal=', address, '&returnGeom=Y&getAddrDetails=Y&pageNum=1')
latlong_url <- URLencode(url)
latlong <- GET(latlong_url)
jsonlatlong <- content(latlong,as="parsed")
if (length(jsonlatlong$results) > 0) {
hdb_resale_data[x,'LATITUDE'] = jsonlatlong$results[[1]]$LATITUDE
hdb_resale_data[x,'LONGITUDE'] = jsonlatlong$results[[1]]$LONGITUDE
}
}
is.na() is used to check for the total number of missing value (NA)
There is no missing data observed.
To avoid going through the long process above again, data will be exported to the new csv.
write.csv(hdb_resale_data, "data/aspatial/hdb_resale_latlong.csv", row.names = FALSE)
resale_data <- read_csv("data/aspatial/hdb_resale_latlong.csv")
Obtain from the column remaininglease_years which was converted from remaining_lease into years as a unit instead of years and months.
lease <- resale_data$remaininglease_years
The floor level is provided in a range instead of individual levels. unique() of the base function is used to identify the unique level range.
There is 17 unique floor level range.
resale_data_sf <- st_as_sf(resale_data,
coords = c("LONGITUDE", "LATITUDE"), crs=4326) %>%
st_transform(crs = 3414)
Create a function to get the distance between the HDB Flat and factors mentioned in the dataset using st_distance() from sf package and dist() is used to calculate the euclidean distance
proximity <- function(df_1, df_2, var) {
df_1_geometry <- st_coordinates(df_1$geometry)
df_2_geometry <- st_coordinates(df_2$geometry)
df_2_geometry <- df_2_geometry[,c(1,2)]
dist_matrix <- spDists(df_1_geometry, df_2_geometry, longlat=FALSE)
dist_matrix_df <- data.frame(dist_matrix)
dist_matrix_min <- dist_matrix_df %>% rowwise() %>% mutate(Min = min(c_across(1:(ncol(dist_matrix_df)))))
df_1[,var] <- dist_matrix_min$Min/1000
return(df_1)
}
resale_data_sf <- proximity(resale_data_sf, eldercare_sf, "PROX_ELDERCARE")
resale_data_sf <- proximity(resale_data_sf, park_sf, "PROX_PARK")
resale_data_sf <- proximity(resale_data_sf, childcare_sf, "PROX_CHILDCARE")
resale_data_sf <- proximity(resale_data_sf, mrtlrt_sf, "PROX_MRTLRT")
resale_data_sf <- proximity(resale_data_sf, presch_sf, "PROX_PRESCH")
resale_data_sf <- proximity(resale_data_sf, supermarket_sf, "PROX_SUPERMARKET")
resale_data_sf = select(resale_data_sf, -1, -2, -3, -5, -6, -9, -10, -11)
resale_data_sf = select(resale_data_sf, -10, -11)
st_write(resale_data_sf, "data/geospatial/resale_data_final_neww.shp")
resale_data_sf <- st_read(dsn="data/geospatial",
layer="resale_data_final_neww")
Reading layer `resale_data_final_neww' from data source
`C:\nxinyan\IS415\IS415_blog-2\_posts\2021-11-05-take-home-exercise-3\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 15901 features and 11 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 11597.31 ymin: 28217.39 xmax: 42623.63 ymax: 48741.06
Projected CRS: SVY21 / Singapore TM
resale_data_sf <- resale_data_sf %>%
rename("AREA_SQM" = "flr_r_s", "remaininglease_years" = "rmnngl_",
"resale_price"= "rsl_prc", "PROX_PARK" = "PROX_PA", "PROX_CHILDCARE" = "PROX_CH", "PROX_MRTLRT" = "PROX_MR", "PROX_PRESCH" = "PROX_PR", "PROX_ELDERCARE" = "PROX_EL", "PROX_SUPERMARKET" = "PROX_SU")
Plotting the distribution of resale_price by using appropriate Exploratory Data Analysis (EDA)
ggplot(data=resale_data_sf, aes(x=`resale_price`)) +
geom_histogram(bins=20, color="black", fill="light blue")

The figure above reveals a right skewed distribution. This means that more 4 room type units were transacted at relative lower prices.
Statistically, the skewed distribution can be normalised by using log transformation. mutate() of dplyr package is used too add a new variable called log_resale_price using a log transformation on the variable resale_price.
resale_data_sf <- resale_data_sf %>%
mutate(`log_resale_price` = log(resale_price))
Plotting the distribution of log_resale_price
ggplot(data=resale_data_sf, aes(x=`log_resale_price`)) +
geom_histogram(bins=20, color="black", fill="light blue")

The distribution is relatively less skewed after the transformation.
Drawing 12 small multiple histograms (also known as trellis plot) by using ggarrange() of ggpubr package.
resale_data_sf$remaininglease_years <- as.numeric(resale_data_sf$remaininglease_years)
AREA_SQM <- ggplot(data=resale_data_sf, aes(x= `AREA_SQM`)) +
geom_histogram(bins=20, color="black", fill="light blue")
remaininglease_years <- ggplot(data=resale_data_sf, aes(x= `remaininglease_years`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_CHILDCARE <- ggplot(data=resale_data_sf, aes(x= `PROX_CHILDCARE`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_ELDERCARE <- ggplot(data=resale_data_sf, aes(x= `PROX_ELDERCARE`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_SUPERMARKET <- ggplot(data=resale_data_sf, aes(x= `PROX_SUPERMARKET`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_PRESCH <- ggplot(data=resale_data_sf, aes(x= `PROX_PRESCH`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_MRTLRT <- ggplot(data=resale_data_sf, aes(x= `PROX_MRTLRT`)) +
geom_histogram(bins=20, color="black", fill="light blue")
PROX_PARK <- ggplot(data=resale_data_sf, aes(x= `PROX_PARK`)) +
geom_histogram(bins=20, color="black", fill="light blue")
ggarrange(remaininglease_years, PROX_PRESCH, PROX_PARK, PROX_MRTLRT, PROX_CHILDCARE, AREA_SQM, PROX_ELDERCARE, PROX_SUPERMARKET, ncol = 3, nrow = 3)

The distribution of the majority of the independent variables are right skewed, only remaininglease_years is slightly left skewed.
Revealing the geospatial distribution 4-room type unit resale prices in Singapore using tmap package
tmap_mode("view")
tm_shape(mpsz_sf)+
tm_polygons() +
tm_shape(resale_data_sf) +
tm_dots(col = "resale_price",
alpha = 0.6,
style="quantile") +
tm_view(set.zoom.limits = c(11,14))